Calling Bullshit

🚀 The Book in 3 Sentences

This book is all about the BS. It is split into defining, identifying, and calling bullshit in the modern world. It is an easy read, with good examples and a skeptical eye on everything.

🎨 Impressions

The book is basically the written version of the authors' course, but overall, it is such a good overall theme and presented so well that it should almost be required reading in school.

So many exciting things, and easy to understand. It is interesting that bullshit is such a widespread phenomenon.

The authors have a micro-lecture series on the same topic.

I wrote a shitty blog post based on that lecture series.

How I Discovered It

I noticed it when I did the course on Calling Bullshit.

Who Should Read It?

I think this is a required reading and should be a good read for everyone who deals with data, storytelling, and other things.

☘️ How the Book Changed Me

I have seen their course already, but it was a good refresher.

✍️ My Top Quotes

THE WORLD IS AWASH WITH bullshit, and we’re drowning in it.
The philosopher Harry Frankfurt recognized that the ubiquity of bullshit is a defining characteristic of our time
One of the most salient features of our culture is that there is so much bullshit.
Nothing that you will learn in the course of your studies will be of the slightest possible use to you [thereafter], save only this, that if you work hard and intelligently you should be able to detect when a man is talking rot, and that, in my view, is the main, if not the sole, purpose of education.
In one of his Socratic dialogues, Euthydemus, Plato complains that the philosophers known as the Sophists are indifferent to what is actually true and are interested only in winning arguments.
Some specialize in eating marine snails, which are protected by hard, thick shells. To smash through these calcite defenses, mantis shrimp have evolved a spring-loading mechanism in their forelimbs that allows them to punch with enormous force. Their hammer-like claws travel 50 mph when they strike. The punch is so powerful that it creates an underwater phenomenon known as cavitation bubbles, a sort of literal Batman “KAPOW!” that results in a loud noise and a flash of light.
This punching power serves another purpose. Mantis shrimp live on shallow reefs, where they
Mantis shrimp are essentially lobster tails with claws on the front.
It is merely an evolved response, a sort of instinct or reflex. A
A sophisticated bullshitter needs a theory of mind—she needs to be able to put herself in the place of her mark. She needs to be able to think about what the others around her do and do not know. She needs to be able to imagine what impression will be created by what sort of bullshit, and to choose her bullshit accordingly.
We impose strong social sanctions on liars. If you get caught in a serious lie, you may lose a friend.
With all of these potential penalties, it’s often better to mislead without lying outright. This is called paltering. If I deliberately lead you to draw the wrong conclusions by saying things that are technically not untrue, I am paltering.
Within linguistics, this notion of implied meaning falls under the area of pragmatics. Philosopher of language H. P. Grice coined the term implicature to describe what a sentence is being used to mean, rather than what it means literally. Implicature allows us to communicate efficiently.
If you claim your toothpaste reduces plaque “by up to” 50 percent, the only way that would be false is if the toothpaste worked too well.
This is bullshit that hides a horrible human toll behind the verbiage of corporate lingo.
Software engineer Alberto Brandolini in 2014, it states: “The amount of energy needed to refute bullshit is an order of magnitude bigger than [that needed] to produce it.”
“an idiot can create more bullshit than you could ever hope to refute.”
Is not the weaknesses in Wakefield’s paper that prove there is no autism-vaccine link: it is the overwhelming weight of subsequent scientific evidence.
Bullshit is not only easy to create, it’s easy to spread. Satirist Jonathan Swift wrote in 1710 that “falsehood flies, and truth comes limping after it.”
Taken together, Brandolini’s principle, Fanelli’s principle, and Swift’s observation tell us that (1) bullshit takes less work to create than to clean up, (2) takes less intelligence to create than to clean up, and (3) spreads faster than efforts to clean it up.
One of the immediately apparent problems with Wakefield’s study was the tiny sample size.
One of the immediately apparent problems with Wakefield’s study was the tiny sample size. His study looked at only twelve children, most of whom reportedly developed the purported syndrome shortly after receiving the MMR vaccine. It is very difficult, if not impossible, to draw meaningful conclusions about rare phenomena from such a tiny sample.
Still, Filippo de Strata was right that when the cost of sharing information drops dramatically, we see changes in both the nature of the information available and the ways that people interact with that information.
The invention of new and various kinds of communication has given a voice and an audience to many people whose opinions would otherwise not be solicited, and who, in fact, have little else but verbal excrement to contribute to public issues.
A satirical website published a headline proclaiming that “70% of Facebook Users Only Read the Headline of Science Stories before Commenting.” The story began by noting that most people don’t read stories before sharing them on social media either. After a couple of sentences, the text gave way to paragraph after paragraph of the standard “lorem ipsum dolor…”—random text used as filler for webpage layouts. The post was shared tens of thousands of times on social media, and we don’t know how many of those who did so were in on the joke.
The study found that the most successful headlines don’t convey facts, they promise you an emotional experience. The most common phrase among successful Facebook headlines, by nearly twofold, is “will make you,” as in “will break your heart,” “will make you fall in love,” “will make you look twice,” or “will make you gasp in surprise” as above.
MIT professor Judith Donath has observed that even when people appear to be talking about other things, they’re often talking about themselves.
Riffing on Allen Ginsberg, tech entrepreneur Jeff Hammerbacher complained in 2011 that “the best minds of my generation are thinking about how to make people click ads. That sucks.”
Social media facilitates the spread of misinformation—claims that are false but not deliberately designed to deceive.
In 2017, Facebook admitted that over the past two years, 126 million US users—half of the adult population and about three-quarters of its US user base—had been exposed to Russian propaganda on the site.
“The point of modern propaganda isn’t only to misinform or push an agenda. It is to exhaust your critical thinking, to annihilate truth.”
In the final days of the 2016 US election, Barack Obama talked extensively about the fake news factories in Macedonia. The people running these factories—often teenagers—created at least 140 popular fake news websites during the election. When a story went viral, it generated huge advertising revenues for the site owners. Some of them were making in excess of $5,000 per month, compared with the average Macedonian monthly salary of $371.
By 2018, Facebook had over two billion legitimate users—but in the same year deleted even more fake accounts: nearly three billion
A second approach is governmental regulation. Some countries have already passed laws against creating or spreading fake news, but we worry about this approach for two reasons. First, it runs afoul of the First Amendment to the US Constitution, which guarantees freedom of speech. Second, who gets to determine what is fake news? If a leader doesn’t like a story, he or she could declare it fake news and pursue criminal charges against the perpetrators.
Harry Frankfurt, the philosopher we introduced in the preface, refined this notion a bit further. He described bullshit as what people create when they try to impress you or persuade you, without any concern for whether what they are saying is true or false, correct or incorrect.
The central theme of this book is that you usually don’t have to open the analytic black box in order to call bullshit on the claims that come out of it.
A recent study reports that in the US, unattractive individuals are more likely to be found guilty in jury trials than their attractive peers.
“Red sky in the morning, sailors take warning. Red sky at night, sailor’s delight.”
Unfortunately, one of the most frequent misuses of data, particularly in the popular press, is to suggest a cause-and-effect relationship based on correlation alone. This is classic bullshit, in the vein of our earlier definition, because often the reporters and editors responsible for such stories don’t care what you end up believing.
After all, 25 to 29 is a demographic in which the frequency of childbirth differs considerably across socioeconomic strata and geographic regions.
Perhaps you already see the problem with this inference. Students aren’t necessarily drinking more beer because they ordered a pitcher. They are probably ordering pitchers because they intend to drink more beer.
But if one is not careful, looking at events chronologically can be misleading. Just because A happens before B does not mean that A causes B—even when A and B are associated. This mistake is so common and has been around for so long that it has a Latin name: post hoc ergo propter hoc. Translated, this means something like “after this, therefore because of it.”
Vigen finds his spurious correlation examples by collecting a large number of data sets about how things change over time. Then he uses a computer program to compare each trend with every other trend. This is an extreme form of what data scientists call data dredging. With a mere one hundred data series, one can compare nearly ten thousand pairs. Some of these pairs are going to show very similar trends—and thus high correlations—just by chance.
There is a key distinction between a probabilistic cause (A increases the chance of B in a causal manner), a sufficient cause (if A happens, B always happens), and a necessary cause (unless A happens, B can’t happen).
WHEN ALL ELSE FAILS, MANIPULATE
Association does not imply causation either. That said, it is worth noticing that although correlation does not imply causation, causation does imply association. Causation may not generate a linear correlation, but it will generate some sort of association.
Approximately 440,000 barrels of whiskey are lost to evaporation in Scotland each year. Most people don’t know how big a whiskey barrel is (about 66 gallons), so we might do better to say that in Scotland, about 29 million gallons are lost every year to the angels. We usually encounter whiskey in 750 ml bottles rather than gallons, so perhaps we would want to report this as a loss of 150 million bottles a year.
Scotch whiskies lose about 2 percent in volume per year of aging, or roughly 0.005 percent per day.
We are not told how many search queries Google handles per day, but estimates place this figure at about 5.5 billion queries a day. So while 0.25 percent sounds small, it corresponds to well over thirteen million queries a day. These two ways of saying the same thing have very different connotations. If we tell you
Number. In general, we advocate that a percentage change be reported with respect to the starting value.
In general, we advocate that a percentage change be reported with respect to the starting value.
Compared to teen drivers without passengers, the relative risk of a teen driver dying in a crash while carrying a passenger over the age of thirty-five is 0.36.
They estimate that, for one year, in people aged 15−95 years, drinking one alcoholic drink a day increases the risk of developing one of the 23 alcohol-related health problems by 0.5%, compared with not drinking at all.
The authors of the study calculated that having a single daily drink would lead to four additional cases of alcohol-related illness per 100,000 people. You would have to have 25,000 people consume one drink a day for a year in order to cause a single additional case of illness.
People who drink two drinks a day have a relative risk of 1.07 (7 percent higher than nondrinkers) and those who drink five drinks a day have a relative risk of 1.37
Even professional scientists sometimes mix up a subtle issue in this regard: the difference between percentages and percentage points. An example is by far the easiest way to illustrate the difference. Suppose that on January 1, the sales tax increases from 4 percent to 6 percent of the purchase price. This is an increase of 2 percentage points: 6% − 4% = 2%. But it is also an increase of 50 percent: The 6 cents that I now pay on the dollar is 50 percent more than the 4 cents I paid previously.
*Second, we can put these numbers into context with a well-chosen comparison. Like influenza vaccines in an ineffective year, the use of seatbelts drops the annual risk of injury from “only” about 2 percent to 1 percent.7 Would you trust your health to an MD who advocated a “health tonic” in lieu of seatbelts?
In the year 2000, African Americans composed 12.9 percent of the US population but a staggering 41.3 percent of the inmates in US jails. Given that, it would seem to be good news that between the year 2000 and the year 2005, the fraction of the jail population that was African American declined from 41.3 percent to 38.9 percent. But the real story isn’t as encouraging as the percentage figures seem to indicate. Over this period, the number of African Americans in US jails actually increased by more than 13 percent. But this increase is obscured by the fact that over the same period, the number of Caucasians in US jails increased by an even larger fraction: 27 percent.
Changing denominators obscure changes in numerators.
The bounty program failed because people did what people always do at the prospect of a reward: They start gaming the system.
*This problem is canonized in a principle known as Goodhart’s law. While Goodhart’s original formulation is a bit opaque,8 anthropologist Marilyn Strathern rephrased it clearly and concisely: When a measure becomes a target, it ceases to be a good measure.
Online voters selected truthiness as the word of the year in a 2006 survey run by the publishers of the Merriam-Webster dictionary. The term, coined in 2005 by comedian Stephen Colbert, is defined as “the quality of seeming to be true according to one’s intuition, opinion, or perception without regard to logic, factual evidence, or the like.” In its disregard for actual logic and fact, this hews pretty closely to our definition of bullshit.
In other words, mathiness, like truthiness and like bullshit, involves a disregard for logic or factual accuracy.
Zombie statistics are numbers that are cited badly out of context, are sorely outdated, or were entirely made up in the first place—but they are quoted so often that they simply won’t die.
The original claim was that over 50 percent of papers go uncited, not unread.
Even with this correction, the 50 percent figure is inaccurate. First, this number represents the fraction of papers uncited after four years, not the fraction that will remain uncited forever. In some fields, such as mathematics, citations scarcely begin to accrue after four years.
Goodhart originally expressed his law as: “Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.”
We should stress that a sample does not need to be completely random in order to be useful. It just needs to be random with respect to whatever we are asking about.
The ad copy could equivalently read, “Some people will see their insurance premiums go up if they switch to GEICO. Other people will see their premiums stay about the same. Yet others will see their premiums drop. Of these, a few people will see their premiums drop a lot. Of those who see a substantial drop, the average savings is $500.”
One of the most important rules for spotting bullshit: If something seems too good or too bad to be true, it probably is.
“One in four Americans will suffer from excessive anxiety at some point,” he explained, “but in my entire career I have only seen one patient who suffered from too little anxiety.”
*We will call this the experienced mean class size,4 because it reflects the class sizes that students actually experience.
The same research team found that the strong form holds on Facebook as well: 84 percent of Facebook users have fewer friends than the median friend count of their friends.
Job performance was negatively correlated with prior success in programming contests.
Five years ago, engineers at Google applied machine learning techniques to their own hiring process, in an effort to identify the most productive employees. They found a surprising result: Job performance was negatively correlated with prior success in programming contests.
Mathematician Jordan Ellenberg has suggested that a phenomenon called Berkson’s paradox can explain a common complaint.
This is Berkson’s paradox in operation. By selecting for both niceness and attractiveness, you have created a negative correlation between niceness and attractiveness among the people whom you would be willing to date.
Two separate instances of Berkson’s paradox—one involving whom you would date and the other involving who would date you—have created this negative correlation among your potential partners, even though there is no negative trend in the population at large.
Once you start thinking about Berkson’s paradox, you’ll see it all over the place.
Selection can have all sorts of interesting consequences, and when trying to understand patterns from data, it is worth thinking about whether there has been any selection bias or deliberate selection operating, and if so, how those factors affect the patterns you observe.
In a clinical trial designed to assess the severity of the side effects of a certain medication, the initial sample of patients may be random, but individuals who suffer side effects may be disproportionately likely to drop out of the trial and thus not be included in the final analysis. This is data censoring, a phenomenon closely related to selection bias.
Contrary to what everyone assumes, the Florida Stand Your Ground graphic was not intended to mislead. It was just poorly designed. This highlights one of the principles for calling bullshit that we espouse. Never assume malice or mendacity when incompetence is a sufficient explanation, and never assume incompetence when a reasonable mistake can explain things.
In architecture, the term “duck” refers to any building where ornament overwhelms purpose, though it is particularly common in reference to buildings that look like the products they sell.
Ducks decorate or obscure the meaningful data in a graphic by aiming to be cute. Glass slippers create a false sense of rigor by shoehorning one type of data into a wholly unsuitable data visualization.
It violates what we term the principle of proportional ink: When a shaded region is used to represent a numerical value, the size (i.e., area) of that shaded region should be directly proportional to the corresponding value.
We noted that bar charts are designed to tell stories about magnitudes, whereas line graphs tell stories about changes. Note also that line graphs use positions rather than shaded areas to represent quantities.
Again, this is pure visual bullshit: An element added to the graph to impress the viewer obscures its meaning without adding
Again, this is pure visual bullshit: An element added to the graph to impress the viewer obscures its meaning without adding any additional information.
“When modern architects righteously abandoned ornament on buildings they unconsciously designed buildings that were ornament….It is all right to decorate construction, but never to construct decoration”
This, even though the address on each piece of mail has to be read and interpreted. For typewritten addresses, this task is reasonably easy to delegate to machines. Handwriting is harder, but the US postal service has developed a remarkable handwriting recognition system that correctly interprets handwritten addresses 98 percent of the time.
Complicated models do a great job of fitting the training data, but simpler models often perform better on the test data than more complicated models. The trick is figuring out just how simple of a model to use. If the model you pick is too simple, you end up leaving useful information on the table.
But extraordinary claims require extraordinary evidence. The idea that homosexual and heterosexual people have differently shaped faces due to prenatal hormone exposure is an extraordinary claim.
When we train machines to make decisions based on data that arise in a biased society, the machines learn and perpetuate those same biases. In situations like this, “machine learning” might better be called “machine indoctrination.”
This problem with adding additional variables is referred to as the curse of dimensionality. If you add enough variables into your black box, you will eventually find a combination of variables that performs well—but it may do so by chance.
One of the reasons that science works so well is that it is self-correcting. Every claim is open to challenge and every fact or model can be overturned in the face of evidence.
Shortly thereafter, the Open Science Collaboration—a large-scale collective effort among dozens of researchers—reported that they were able to replicate only 39 out of 100 high-profile experiments in social psychology.
Loosely speaking, a p-value tells us how likely it is that the pattern we’ve seen could have arisen by chance alone. If that is highly unlikely, we say that the result is statistically significant.
You and the prosecutor have each stressed different conditional probabilities. A conditional probability is the chance that something is true, given other information.
This confusion is so common that it has its own name, the prosecutor’s fallacy. Our story illustrates why. It can be a matter of life and death in the courtroom, but it’s also a common source of confusion when interpreting the results of scientific studies.
Scientists are stuck using p-values because they don’t have a good way to calculate the probability of the alternative hypothesis.
*Purely as a matter of convention, we often use a p-value of 0.05 as a cutoff for saying that a result is statistically significant.7 In other words, a result is statistically significant when p < 0.05, i.e., when it would have less than 5 percent probability of arising due to chance alone.
We sometimes call it the base rate fallacy because it involves ignoring the base rate of the disease in a population when interpreting the results of a test.
The epithet “cafeteria Catholic” suggests a worshipper who picks and chooses from the tenets of the faith, ignoring those that are uncomfortable or unpleasant.
So how do you tell whether a scientific article is legitimate? The first thing to recognize is that any scientific paper can be wrong. That’s the nature of science; nothing is above questioning. No matter where a paper is published, no matter who wrote it, no matter how well supported its arguments may be, any paper can be wrong.
While it’s an important part of the scientific process, peer review does not guarantee that published papers are correct.
Science is a cumulative process. Even though experiments are not often directly replicated, science proceeds when researchers build upon previous results. If a result is false, people cannot build on it effectively. Their attempts will fail, they’ll go back and reassess the original findings, and in this way the truth will come out.
One system, tested by London’s Metropolitan Police, was described as having a 0.1 percent error rate based on its rate of correctly identifying true negatives. But only eight of the twenty-two suspects apprehended by using the system were true positives, for a whopping error rate of 64 percent among the positive results returned.
- If researchers want to test multiple hypotheses, there are statistical methods such as the Bonferroni correction that allow this. Each individual test then requires stronger evidence to be considered significant, so that there is roughly a one-in-twenty chance that any of the tested hypotheses would appear significant if the null hypothesis were true.*
Journalists are trained to ask the following simple questions about any piece of information they encounter: Who is telling me this? How does he or she know it? What is this person trying to sell me?
Think back to philosopher Harry Frankfurt’s distinction between bullshit and lies. Lies are designed to lead away from the truth; bullshit is produced with a gross indifference to the truth
For a delightful introductory course on Fermi estimation, see Lawrence Weinstein and John A. Adam, Guesstimation (Princeton, N.J.: Princeton University Press, 2008).
Confirmation bias is the tendency to notice, believe, and share information that is consistent with our preexisting beliefs
In our experience, patterns based on human behavior tend to be noisy. We might expect to see a general tendency toward one or the other types of description for each gender, but we’d expect some crossover.
We define calling bullshit as follows: Calling bullshit is a performative utterance in which one repudiates something objectionable. The scope of targets is broader than bullshit alone. You can call bullshit on bullshit, but you can also call bullshit on lies, treachery, trickery, or injustice.
“There can be no liberty for a community which lacks the means by which to detect lies.”
Reductio ad absurdum. This strategy, which dates back at least to Aristotle, shows how your opponent’s assumptions can lead to ridiculous conclusions.
We heard about a great counterexample raised at a Santa Fe Institute workshop on immune systems. One physicist at the meeting had learned a little bit about the immune system and had created a mathematical model to account for it. In his talk, he described his model and stressed its powerful implications. Not only are immune systems like ours useful for dealing with pathogens, he explained, but they are absolutely necessary. In order to survive in environments overrun with pathogens, he predicted, long-lived multicellular organisms such as ourselves must necessarily have immune systems with certain distinctive features. For example, the model suggested that long-lived organisms will need to have cells that detect the presence of viral infection, and other cells that produce a wide range of antibodies, generated randomly and then selected to proliferate should they match the invading pathogen. At first blush, the argument may have seemed reasonable. Plus, there was fancy mathematics behind it! An immunologist in the room seemed unconvinced. It is at this point that someone will often raise their hand to question the assumptions of the mathematical model or ask for clarification of the analysis; extensive technical discussion and often disagreements on the mathematical details follow. The immunologist—with decades of experience designing and conducting experiments, reading and reviewing thousands of papers, and teaching courses on immunology to college students—took a different approach. Instead he asked a question that required no more than a basic education in biology. He raised his hand and asked the speaker: “But what about trees?”
Once we reach our physical peak sometime in our twenties or thirties, our performance for most physical and cognitive tasks begins to decline. Biologists call this process senescence.
A null model helps us understand what we would observe in a very simple system where not much is going on. In this case we can use a computer simulation to create a pretend world where age doesn’t affect running speed.
That is what null models do. The point of a null model is not to accurately model the world, but rather to show that a pattern X, which has been interpreted as evidence of a process Y, could actually have arisen without Y occurring at all.
The first rule of arguing on the Internet seems to be “Always double down on your own stupidity,” but we strongly discourage this practice.